ggplot2

ggplot2 Overview

Why ggplot2?

Advantages of ggplot2

  • consistent underlying grammar of graphics (Wilkinson, 2005)
  • plot specification at a high level of abstraction
  • very flexible
  • theme system for polishing plot appearance

Grammar of Graphics

The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want.

Building blocks of a graph include:

  • data
  • aesthetic mapping
  • geometric object
  • statistical transformations
  • faceting

ggplot2 VS Base Graphics

Compared to base graphics, ggplot2

  • is more verbose for simple / canned graphics
  • is less verbose for complex / custom graphics
  • does not have methods (data should always be in a data.frame)
  • uses a different system for adding plot elements

Aesthetic Mapping

Aesthetics are things that you can see. Examples include:

  • position (i.e., on the x and y axes)
  • color (“outside” color)
  • fill (“inside” color)
  • shape (of points)
  • linetype
  • size

Aesthetic mappings are set with the aes() function.

Geometric Objects (geom)

Geometric objects are the actual marks we put on a plot. Examples include:

  • points (geom_point)
  • lines (geom_line)
  • boxplot (geom_boxplot)

A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator

NCAA Basketball data

We will use data from the NCAA basketball tournament from 2011 - 2016.

hoops <- read_csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/TourneyDetailedResults.csv')
hoops_2011 <- hoops %>% filter(Season >= 2011)
hoops_2011
## # A tibble: 402 x 34
##    Season Daynum Wteam Wscore Lteam Lscore Wloc  Numot  Wfgm  Wfga Wfgm3 Wfga3
##     <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1   2011    134  1155     70  1412     52 N         0    26    50     4    13
##  2   2011    134  1421     81  1114     77 N         1    27    54     4    12
##  3   2011    135  1427     70  1106     61 N         0    23    54     4    16
##  4   2011    135  1433     59  1425     46 N         0    20    59     9    24
##  5   2011    136  1139     60  1330     58 N         0    22    54     7    26
##  6   2011    136  1140     74  1459     66 N         0    24    61     6    22
##  7   2011    136  1153     78  1281     63 N         0    29    54     4    11
##  8   2011    136  1163     81  1137     52 N         0    32    66     9    24
##  9   2011    136  1196     79  1364     51 N         0    29    53     8    23
## 10   2011    136  1211     86  1385     71 N         0    28    52     9    15
## # … with 392 more rows, and 22 more variables: Wftm <dbl>, Wfta <dbl>,
## #   Wor <dbl>, Wdr <dbl>, Wast <dbl>, Wto <dbl>, Wstl <dbl>, Wblk <dbl>,
## #   Wpf <dbl>, Lfgm <dbl>, Lfga <dbl>, Lfgm3 <dbl>, Lfga3 <dbl>, Lftm <dbl>,
## #   Lfta <dbl>, Lor <dbl>, Ldr <dbl>, Last <dbl>, Lto <dbl>, Lstl <dbl>,
## #   Lblk <dbl>, Lpf <dbl>

Graphical Primitives/ ggplot

graph.a <- ggplot(data = hoops_2011, aes(Lfgm,Wfgm))
graph.a

Adding Geoms: geom_point()

graph.a + geom_point()

Adding Geoms: geom_smooth()

graph.a + geom_point() + 
  geom_smooth(method = 'loess', formula = 'y ~ x')

Adding Geoms: geom_rug()

graph.a + geom_point() + 
  geom_smooth(method = 'loess', formula = 'y ~ x') +
  geom_rug()

Adding Geoms: geom_density2d()

graph.a + geom_point() + 
  geom_smooth(method = 'loess', formula = 'y ~ x') +
  geom_rug() + geom_density2d()

Adding Geoms: geom_jitter()

graph.a + geom_rug() + geom_density2d() + geom_jitter()

Adding Geoms: labs()

graph.a  + geom_rug() + geom_density2d() +
 geom_jitter() + 
  labs(x='Losing Team Field Goals Made', 
       y = 'Winning Team Field Goals Made')

Scales: xlim() and ylim()

graph.a + geom_rug() + geom_density2d() +
 geom_jitter() + 
  labs(x='Losing Team Field Goals Made', 
       y = 'Winning Team Field Goals Made') +
  xlim(c(0,max(hoops_2011$Wfgm))) + ylim(c(0,max(hoops_2011$Wfgm)))

Themes

There are a wide range of themes available in ggplot: theme overview

More about aes

graph.a + geom_jitter(col = 'firebrick4')

More about aes

graph.a + geom_jitter(aes(col = as.factor(Season)))

More about aes

graph.a + geom_jitter(aes(col = as.factor(Season)), size=3,alpha=.4)

More about aes

More about aes: Comment

graph.a + 
  geom_jitter(aes(shape = as.factor(Season),col=Wscore),
              size=3,alpha=.4)

Faceting

Faceting: Comment

graph.a + geom_point() + facet_wrap(~Season)

Faceting

graph.a + facet_wrap(~Season) + 
  geom_jitter(alpha=.5, aes(color=Wfgm3))

Seattle Housing Data Set

Use the Seattle Housing Data Set http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv to create an interesting graphic, include informative titles, labels, and add an annotation.

seattle_in <- read_csv('http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv')
## Parsed with column specification:
## cols(
##   price = col_double(),
##   bedrooms = col_double(),
##   bathrooms = col_double(),
##   sqft_living = col_double(),
##   sqft_lot = col_double(),
##   floors = col_double(),
##   waterfront = col_double(),
##   sqft_above = col_double(),
##   sqft_basement = col_double(),
##   zipcode = col_double(),
##   lat = col_double(),
##   long = col_double(),
##   yr_sold = col_double(),
##   mn_sold = col_double()
## )

Exercise: ggplot2

Now use ggplot2 to create an interesting graph using the Seattle Housing data set.

Solution: ggplot2

## `geom_smooth()` using formula 'y ~ x'

Solution: ggplot2

seattle_in$zipcode <- as.factor(seattle_in$zipcode)
graph.a <- ggplot(data = seattle_in, aes(sqft_living,price))
graph.a + geom_jitter(aes(col = zipcode))+ 
  theme(plot.title = element_text(size=8), 
        text = element_text(size=6)) + geom_smooth(method='loess')+
  ggtitle('Seattle Housing Sales: Price vs. Square Footage Living Space') + 
  ylab('Sales Price (million dollars)') + 
  xlab('Living Space (square foot)')+
  scale_y_continuous(breaks=c(seq(0,7000000,by=1000000)),
                     labels=as.character(0:7)) +  
  annotate('text',3500,6000000, 
           label = 'Housing price depends on zipcode', size=2) +
  annotate("rect", xmin = 0, xmax = 7250, ymin = 5500000, ymax = 6500000, alpha = .6) + 
  geom_segment(aes(x=3500, xend=3500, y=5500000, yend=3000000),
                           arrow = arrow(length = unit(0.5, "cm")))